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TECHNIQUES FOR IDENTIFYING AND 
ACCESSING INFORMATION OF INTEREST 
TO A USER IN A NETWORK 
ENVIRONMENT WITHOUT 
COMPROMISING THE USER'S PRIVACY 

CROSS-REFERENCES TO RELATED 
APPLICATIONS 

This application claims priority from the following 
applications, the entire disclosures of which are herein 
incorporated by reference for all purposes: 

(1) U.S. Provisional Patent Application No. 60/206,190, 
entitled "SYSTEM AND METHOD FOR PROVIDING 
INFORMATION TO USERS IN A NETWORK ENVI- 
RONMENT WITHOUT COMPROMISING USER PRI- 
VACY" filed May 22, 2000; and 

(2) U.S. Provisional Patent Application No. 60/205,938, 
entitled "SYSTEM AND METHOD FOR CREATING 
VIRTUAL COMMUNITIES WHILE PRESERVING 
THE PRIVACY OF USERS IN THE VIRTUAL COM- 
MUNITY" filed May 18, 2000. 

The present application also incorporates herein by ref- 
erence for all purposes the entire disclosures of the follow- 
ing applications which are filed concurrently with this 
application: 

(1) U.S. patent application Ser. No. 09/861,082 (currently 
pending), entitled "TECHNIQUES FOR IDENTIFYING 
VIRTUAL USER GROUPS IN A NETWORK ENVI- 
RONMENT WITHOUT COMPROMISING USER PRI- 
VACY"; 

(2) U.S. patent application Ser. No. 09/861,471 (currently 
pending), entitled "TECHNIQUES FOR JOINING VIR- 
TUAL USER GROUPS IN A NETWORK ENVIRON- 
MENT AND RECEIVING INFORMATION RELATED 
TO THE VIRTUAL USER GROUPS WITHOUT COM- 
PROMISING USER PRIVACY"; and 

(3) U.S. patent application Ser. No. 09/861,094 (currently 
pending), entitled "TECHNIQUES FOR SHARING 
CONTENT INFORMATION WITH MEMBERS OF A 
VIRTUAL USER GROUP IN A NETWORK ENVIRON- 
MENT WITHOUT COMPROMISING USER PRI- 
VACY". 

BACKGROUND OF THE INVENTION 

The present invention relates generally to identifying and 
accessing information stored by communication and infor- 
mation networks. More particularly, the present invention 
describes techniques for identifying and accessing informa- 
tion of interest to a use while preserving the privacy of the 
user. 

With the widespread use of computers, an expanding 
telecommunication network, and the rising popularity of 
communication networks such as the Internet, an increasing 
amount of information is contained in documents stored by 
computer systems coupled to the communication networks. 
Users can access these documents by using computer sys- 
tems coupled to the communication networks. For example, 
a user can browse the Internet and access web pages stored 
by servers coupled to the Internet. 

Computer systems connected to communication networks 
such as the Internet can generally be classified as "clients" 
or "servers" depending on the role the computer systems 
play with respect to requesting information or storing/ 
providing information. Computers systems which are used 
by users to access information are typically called "client" 
computers. Computer systems which store information and 
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provide the information to client computers are usually 
referred to as "server" systems. Accordingly, server systems 
are responsible for receiving information requests from 
client systems, performing processing required to satisfy the 

5 requests, and for forwarding the results/information corre- 
sponding to the information requests back to the requesting 
client systems. The processing required to satisfy the client 
request may be performed by a single server system or may 
alternatively be delegated to other servers connected to the 

10 communication network, such as the Internet. It should be 
apparent that a particular computer system may function 
both as a server and a client. 

In the World Wide Web ("Web") environment, informa- 
tion resources are typically stored in the form of hypertext 

1S documents called "web pages" which can be accessed and 
read by users of the Web. A web page may incorporate any 
combination of text, graphics, audio and video content, 
software programs, and other data. Web pages may also 
contain hypertext links to other web pages. Web pages are 

2Q typically stored on web servers or content servers coupled to 
the Internet. Each web page is uniquely identified by an 
address called a Uniform Resource Locator (URL) that 
enables users to access the web page. 

Users typically access web pages using a program called 

25 a "web browser" which generally executes on a client 
computer coupled to the Internet. The web browser is a type 
of client application that enables users to select, retrieve, and 
perceive information contained in web pages. Examples of 
browsers include the Internet Explorer browser program 

30 provided by Microsoft Corporation, the Netscape Navigator 
browser provided by Netscape Corporation, and others.. 
Users generally access web pages by providing URL infor- 
mation to the browser, either directly or indirectly, and the 
browser responds by retrieving the web page corresponding 

35 to the user-provided URL from the Internet. The retrieved 
web page is then displayed to the requesting user on the 
client computer. 

Due to the vast volume of information available via 
communication networks such as the Internet, it is becoming 

40 increasingly difficult for a user to identify documents which 
contain information of interest to the user or documents 
which are relevant to the user. For example, in a Web 
environment, a user may be interested in locating web pages 
containing information on a particular topic, e.g., Thai 

45 cooking. In a Web environment, the user may locate the 
relevant web pages by accessing one or more web servers, 
and browsing through web pages stored by the one or more 
web servers to identify web pages containing information 
related to Thai cooking. However, searching for web pages 

50 in this manner is a non-trivial task because the user does not 
typically know which web servers store information of 
interest to the user. Further, since each web server may store 
a vast number of web pages, in order to find web pages 
containing information of interest to the user (e.g., web 

55 pages containing information related to Thai cooking), the 
user is often forced to sift through large volumes of infor- 
mation and web pages, most of which are irrelevant to the 
user. As a result, the task of identifying relevant web pages 
can be very time consuming and frustrating to the user, and 

60 may not yield the results desired by the user. 

In order to alleviate the above problem, most users 
generally use programs which help identify relevant docu- 
ments from a large pool of documents. These programs are 
commonly referred to as search engines and are generally 

65 executed by servers coupled to the communication network. 
Examples of search engines in the Internet environment 
include search engines provided by Yahoo, Google, Lycos, 
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Excite, Altavista, and the like which enable users to identify 
web pages of interest to the user. 

Search engines typically use a crawler or a spider to find 
information about documents stored by the communication 
network which are accessible to the search engine and which 
can be located and searched using the search engine. For 
example, in a Web environment, a crawler may access web 
pages and URL links to other web pages embedded in the 
web pages, and so on. For each web page accessed by the 
crawler, the crawler discovers information about the web 
page including the URL of the web page, the contents of the 
web page, the web server storing the web page, and the like. 
The information collected by a crawler is usually stored by 
the server providing the search engine in the form of an 
index. 

An index built by a search engine generally facilitates 
identification of documents based on criteria related to the 
documents or their contents. The criteria may include words 
occurring in the documents, concepts or topics to which the 
documents relate, subject matter of the documents, and the 
like. The structure of an index may vary based on the search 
engine. For example, in a Web environment, a particular 
search engine may prepare an index mapping words found 
in a plurality of web pages to the URLs corresponding to the 
web pages. In another index, the information may be 
indexed based on titles, headings, subheadings, etc. found in 
the web pages, or based upon concepts and topics extracted 
from the web pages contents, and so on. In general, indices 
are built in a way that facilitates the identification of the 
documents and/or locations of the documents. In a Web 
environment, the locations of documents may be identified 
by URLs corresponding to the web pages. 

A search engine also provides a search tool which allows 
users to identify documents of interest using information 
stored in the index generated by the search engine. In order 
to identify documents of interest, a user generally configures 
a query using a client computer. The query may contain 
query terms which describe, for example, a topic or concept 
for which the user is interested in finding more information. 
For example, if the user is interested in finding information 
on Thai cooking, the query terms may include the words 
"Thai" and "cooking." 

The user-configured query is then communicated from the 
user's client computer to a remote server system executing 
a search engine. Upon receiving the search query, the search 
engine executing on the remote server identifies documents 
(or locations of the documents) which match or satisfy the 
user query based upon information stored in the index used 
by the search engine. The search engine may use various 
techniques to determine documents which are relevant to the 
search query received from the user's client system. Infor- 
mation identifying the relevant documents or their locations 
determined by the search engine is then communicated from 
the search engine server to the user's client computer. The 
user may then use the information received from the search 
engine to access one or more of the relevant documents. 

Some search engines also perform searches implicitly 
without receiving specific user input based on the contents 
of documents (e.g., web pages) viewed by the user. These 
search engines use the contents of the document being 
browsed/viewed by the user as a search query which is 
communicated from the user computer to the search engine 
server. Based on the contents of the document being viewed 
by the user and based upon index information used by the 
search engine, the search engine identifies documents of 
interest to the user. Information related to the documents 
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identified by the search engine is then communicated to the 
user system. The information may then be presented to the 
user via a pop-up screen which appears on an output device 
of the user's computer system. For example, in a Web 

5 environment, a window may appear on the user's display 
device listing URLs corresponding to documents identified 
by the search engine to be of interest to the user based on the 
contents of the documents presently viewed by the user. 
Examples of companies which provide such implicit search 

1Q engines include Nano (http://www.nano.com/), Kenjin 
(http://www.autonomy.com), Third Voice (http:// 
www.thirdvoice.com/), Flyswat (http://www.llyswat.com), 
Gurunet (http://www.gurunet.com), Annotate (http:// 
www.annotate.net/) and Alexa (http://www.alexa.com/). 

15 In a Web environment, the relevant documents may be 
web pages which may be identified by URLs. Accordingly, 
the search engine may communicate a list of URLs of 
interest to the user to the user's client system in response to 
the user query. The user may then select one or more URLs 

20 from the list of URLs and access web pages corresponding 
to the selected URLs. When the user selects a URL, the URL 
request is sent to a web server storing the web page 
corresponding to the URL, and the web server responds by 
communicating the requested web page to the user's client 

25 computer system. The server executing the search engine 
may act as a conduit forwarding the selected web page 
received from the web server to the user client computer 
system. 

While conventional search engines simplify the process of 

30 identifying documents containing information of interest to 
a user, they also compromise the user's privacy. This is 
because conventional search engine servers frequently track 
and/or mine the user's browsing activities and track infor- 
mation provided by the user to the search engine. For 

35 example, several conventional search engines mine, without 
the user's permission, information contained in user search 
queries (which may contain information of a sensitive and 
private nature) provided to the search engines. Several 
conventional search engines also track the contents of docu- 

40 ments (e.g. web pages) accessed by the user using the search 
engine. For example, in a Web environment, conventional 
search engines track the web pages accessed by the user, the 
content of the web pages, transactions performed by the user 
using the web pages, and other like information without the 

45 user's permission. 

The information mined or tracked by conventional search 
engines is then used to ascertain information about the user's 
interests, likes/dislikes, the user's shopping preferences, 
information related to the user's use of the Internet, and 

50 other information related to the user and the user's behavior. 
Since users generally have a tendency to use a particular 
search engine to perform searching, over a period of time, 
the particular search engine is capable of building a pretty 
detailed profile of the user and the user's behavior. 

55 The user information collected by the search engines and 
the user profile information built by the search engines, 
which may be sensitive in nature and contain confidential 
information, may then be distributed or even sold by pro- 
viders of search engines to entities such as advertising 

60 agencies, government agencies, insurance companies, busi- 
ness entities, and the like. This may result in the user being 
subjected to unsolicited Spam mail messages, unwelcome 
advertisements, credit card fraud, mail fraud, banking fraud, 
and other unwelcome activities. As a result, the use of a 

65 conventional search engine executing on a remote server can 
severely compromise a user's privacy and security. Further, 
since the information collected by the search engines is 
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typically stored on a server system which is located at a the selected first document from a server storing the first 
remote location from the user's computer system, the user document in response to the signal, 
has very little control on the collection and dissemination of Various additional objects, features and advantages of the 
the information. * present invention can be more fully appreciated with refer- 
In light of the above, there is a need for techniques which 5 ence to the detailed description and accompanying drawings 
allow a user to identify and access documents of interest to that follow, 
the user (e.g., web pages in a Web environment) without 

compromising the user's privacy and security. BRIEF DESCRIPTION OF THE DRAWINGS 

SUMMARY OF THE INVENTION 10 FIG ; 1 « a ^ ° f *, d ^ tribu t ted f 

computer network which may incorporate an embodiment of 

According to the present invention techniques are pro- the present invention; 

vided which allow a user to identify and access documents FIG. 2 is a simplified block diagram of a computer system 

(e.g., web pages) of interest to the user in a network according to an embodiment of the present invention; 

environment without compromising the user's privacy. 15 nG 3 ^ a simplified high-level flowchart depicting 

More particularly, according to an embodiment of the processing performed according to an embodiment of the 

present invention, the user system receives index informa- present mvention t0 identify and access documents of inter- 

tion which is used to identify documents of interest to the est tQ a mcT without com p romism g the user's privacy; 

user at the user system itself without having to have to A , . , c . , . c t . t , t , 

i * j ' c t. FIG. 4 depicts a portion of index information that may be 
provide any user-related information to search engines » n . ; r . , 4 t J , 
* rr» * * »* 20 communicated from an index server to a user system accord- 
executing on remote servers. The present invention pre- . ,,. . . . . . J , 

° . * ii- j • ■ ■ • *u ing to an embodiment of the present invention; and 

serves user privacy by controlling and minimizing the & r 

communication and collection of user-related information FIG - 5 de P icts information stored by a user system and 

from user system. Merely by way of example, the present modules which are executed by the user system to provide 

invention allows users to identify and access web pages 25 features according to an embodiment of the present inven- 

from web servers coupled to a communication network such tl0n - 

as the Internet without compromising user privacy. DESCRIPTION OF THE SPECIFIC 

According to an embodiment of the present invention, EMBODIMENTS 
techniques are provided which enable a user system to 

access a first document from a plurality of documents stored 30 According to the present invention techniques are pro- 

by a plurality of web servers. In this embodiment, an index vided which allow a user to identify and access documents 

server determines index information to be communicated to (e.g., web pages) of interest to the user in a network 

the user system, the index information comprising informa- environment without compromising the user's privacy, 

tion identifying the plurality of documents stored by the More particularly, according to an embodiment of the 

plurality of web servers and information related to the 35 present invention, the user can identify and access docu- 

contents of the plurality of documents. The index server ments of interest to the user at the user system itself without 

communicates the index information to the user system. The having to have to provide user-related information to search 

user system is configured to identify a first set of documents engines executing on remote servers. The present invention 

from the plurality of documents using the index information preserves user privacy by controlling and minimizing the 

received from the index server, the first set of documents 40 tracking and communication of user-related information 

including the first document, to receive a signal indicating from the user system. Techniques according to the present 

selection of the first document from the first set of invention allow the user to control the dissemination of 

documents, and responsive to the signal, to access the information related to documents and their contents 

selected first document from a web server storing the first accessed by the user. Merely by way of example, the present 

document. According to the teachings of the present 45 invention allows a user to identify web pages of interest to 

invention, the user system is configured to identify the first the user and to access the relevant web pages from web 

set of documents substantially free from interaction with the servers coupled to a communication network such as the 

index server and the plurality of web servers. Internet without compromising user privacy. 

According to another embodiment, the present invention The invention has been described below using a Web- 
provides techniques for identifying and accessing a first 50 based embodiment of the present invention which is used to 
document from a plurality of documents stored by a plurality identify and access web pages of interest to a user. It should 
of servers using a data processing system. In this however be apparent that the present invention is not 
embodiment, the data processing system is configured to restricted to the Web environment, and may also be used in 
receive index information from an index server, the index other network environments such as an intranet, a WAN, a 
information comprising information identifying the plurality 55 wireless network, and the like. Additionally, the present 
of documents stored by the plurality of servers and infor- invention can be used to identify and access other types of 
mation related to the contents of the plurality of documents. documents besides web pages. 

The data processing system is configured to identify a first FIG. 1 is a simplified block diagram of a distributed 
set of documents from the plurality of documents using the computer network 100 which may incorporate an embodi- 
index information received from the index server, the first 60 ment of the present invention. As shown, computer network 
set of documents including the first document. According to 100 comprises a number of computer systems coupled to a 
the teachings of the present invention, the data processing communication network 108 via communication links 110. 
system is configured to identify the first set of documents The computer systems depicted in FIG. 1 include a user 
substantially free from any interaction with the plurality of computer system 102, web server systems 104-1, 104-2, and 
servers and the index server. The data processing system is 65 104-3, and an index server 106. Distributed computer net- 
also configured to receive a signal indicating selection of the work 100 depicted in FIG. 1 is merely illustrative of an 
first document from the first set of documents, and to access embodiment incorporating the present invention and does 
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not limit the scope of the invention as recited in the claims. and 114-b. According to another embodiment of the present 

One of ordinary skill in the art would recognize other invention, index server 106 may process the indices infor- 

variations, modifications, and alternatives. For example, a mation received from the web servers before communicating 

plurality of user systems 102 may be coupled to communi- the information to user system 102. In yet other embodi- 

cation network 108. These user systems may be coupled 5 meDts of the present invention, index server 106 may itself 

directly to communication system 108 (e.g. user system 102 be configured to generate an index for information stored by 

depicted in FIG. 1), or may alternatively be coupled to the computer systems coupled to communication network 108. 

communication network via an access provider (not shown) For example index server 106 may use spiders and crawlers 

or via some other server system. t0 coUec f formation related to documents accessible via 

communication network 108 and build an index based on the 

Communication network 108 provides a mechanism ™ collected information. Index server 106 may then commu- 

allowing the various components of computer network 100 nicate the gene rated index to user system 102. 

to communicate and exchange information with each other , n alt6mativ6 6mbodim6nts of the present invention, the 

Communication network 108 may itself be comprised of jndex information ^ directly communicated from web 

many interconnected computer systems and communication auwm 104 tQ usef m 102 , t should be , tha( a 

links. While in one embodiment, communication network 15 s tem mnct ; on to|h as , web 

108 is the Internet, in other embodiments, communication m and M an index server 1Q6 Furtner information 

network 108 may be any suitable communication network related tQ me performed by computer systems 

mcludmg a local area network (LAN), a wide area network d ic , ed in no. x ^ descri bed below in conjunction with 

(WAN), a wireless network, a intranet, a pnvate network, a flowchart 300 de icted m ma y 

public network, a switched network, combinations thereof, 20 ~« . , . c ■ . j * 

d th lik index information communicated to users system 102 

an e e * comprises information about documents accessible via com - 

Communication links HO may be hardwire links, optical munication network 108. For a particular document, the 

links, satellite or other wireless communications links, wave index m f orma tion may include information identifying the 

propagation links, or any other mechanisms for communi- ^ document (e.g. tide of the document, etc.), information 

cation of information. Various communication protocols identifying the location of the document (e.g. information 

may be used to facilitate communication between the van- about web servers stor ing the document), information 

ous systems shown in FIG. I. These communication proto- related t0 the contents 0 f lne document (e.g. information 

cols may include TCP/IP, HTTP protocols, extensible about C0DCe pt s discussed by the document, or a topic or 

markup language (XML), wireless application protocol subject to which the document relates), and other informa- 

(WAP), vendor-specific protocols, customized protocols, tion rdated t0 the docume nt. The index information is 

and others. generally organized in a manner which facilitates identifi- 

According to an embodiment of the present invention, cation of documents and/or locations of the documents 

user system 102 can be used by users to identify and access based on criteria related to the documents and/or their 

documents stored by the various computer systems coupled 35 contents. The criteria may include words occurring in the 

to communication network 108. In a Web environment, documents, concepts or topics discussed by the documents, 

users may use user systems 102 to access web pages and contents of the documents, servers storing the documents, 

other information resources stored by servers, such as web and other attributes of the documents, 

servers 104, coupled to communication network 108. As FIG. 2 is a simplified block diagram of a computer system 

described above, users generally use a browser program 4Q 200 according to an embodiment of the present invention, 

executing on user system 102 to identify, access, and view Computer system 200 may be used as a user system 102, an 

web pages and other information stored by computer sys- mdex server system 106, a web server 104, and other 

terns coupled to communication network 108. User system systems coupled to communication network 108. As shown 

102 generally functions as a client requesting information m piG. 2, computer system 200 includes at least one 

from the servers coupled to communication network 108. 45 processor 204, which communicates with a number of 

Web server systems 104 store information resources and peripheral devices via bus subsystem 202. These peripheral 

documents which may be accessed by user systems 102 devices may include a storage subsystem 212, comprising a 

coupled to communication network 108. In a Web memory subsystem 214 and a file storage subsystem 220, 

environment, the documents may be stored in the form of user interface input devices 210, user interface output 

web pages which can be accessed by users using user 50 devices 208, and a network interface subsystem 206. The 

systems 102. One or more of the web servers (e.g., servers input and output devices allow user interaction with com- 

104-2 and 104-3 depicted in FIG. 1) may also provide puter system 200. A user may be a human user, a device, a 

conventional search engines 112 which allow users to iden- process, another computer, and the like. Network interface 

tify documents of interest to users. Servers which provide subsystem 206 provides an interface to communication 

search engines may also store indices 114 which are used by 55 network 108 and may be coupled via the network to corre- 

the search engines to identify documents relevant to the user. sponding interface devices in other computer systems. 

Index server 106 is configured to communicate index User interface input devices 210 may include a keyboard, 

information to user system 102. According to an cmbodi- pointing devices such as a mouse, trackball, touchpad, or 

ment of the present invention, the index information com- graphics tablet, a scanner, a barcode scanner, a touchscreen 

municated by index server 106 to user system 102 includes 60 incorporated into the display, audio input devices such as 

indices received by index server 106 from one or more web voice recognition systems, microphones, and other types of 

servers 104 coupled to communication network 108. For input devices. In general, use of the term "input device" is 

example, for the distributed system depicted in FIG. 1, index intended to include all possible types of devices and ways to 

server 106 may receive index information 114-a from server input information into computer system 200 or to commu- 

104-2 and index information 114-6 from web server 104-3, 65 nication networks coupled to computer system 200. 

and the index information communicated by index server User interface output devices 208 may include a display 

106 to user system 102 may include index information 114-a subsystem, a printer, a fax machine, or non-visual displays 
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such as audio output devices. The display subsystem may be and the associated description describes a process for iden- 

a cathode ray tube (CRT), a flat-panel device such as a liquid tifying and accessing web pages of interest to the user, it 

crystal display (LCD), or a projection device. The display should be apparent that the process can also be used to 

subsystem may also provide non-visual display such as via identify and access other types of documents e.g., docu- 

audio output devices. In general, use of the term "output 5 ments created using application programs such as word 

device" is intended to include all possible types of devices processors, graphics applications, etc., accessible via a net- 

and ways to output information from computer system 200. work environment. Flowchart 300 depicted in FIG. 3 is 

Storage subsystem 212 may be configured to store the merely illustrative of an embodiment incorporating the 

basic programming and data constructs that provide the present invention and does not limit the scope of the 

functionality of the computer system and of the present *0 invention as recited in the claims. One of ordinary skill in the 

invention. For example, according to an embodiment of the art would recognize other variations, modifications, and 

present invention, software modules implementing the func- alternatives. 

tionality of the present invention may be stored in storage As depicted in FIG. 3, index server 106 configures index 

subsystem 212. These software modules may be executed by information to be communicated to user system 102 (step 

processors) 204 of computer system 200. In a distributed 15 302). Index server 106 may use various techniques to 

environment, the software modules may be stored on a configure the index information. According to an embodi- 

plurality of computer systems and executed by processors of ment of the present invention, index server 106 configures 

the plurality of computer systems. Storage subsystem 212 the index information based on indices information received 

may also provide a repository for storing various databases from one or more web servers coupled to communication 

which may be used to store information according to the 20 network 108. For example, for the distributed system 

teachings of the present invention. Storage subsystem 212 depicted in FIG. 1, index server 106 may receive index 

may comprise memory subsystem 214 and file storage information 114-a from server 104-2 and index information 

subsystem 220. 114-6 from web server 104-3, and index server 106 may 

Memory subsystem 214 may include a number of memo- configure or build the index information to be communi- 

ries including a main random access memory (RAM) 218 25 cated 10 user system 102 based on index information 114-a 

for storage of instructions and data during program execu- an d 114-6. 

tion and a read only memory (ROM) 216 in which fixed According to an embodiment of the present invention, 
instructions are stored. File storage subsystem 220 provides web servers 104 may communicate the indices information 
persistent (non-volatile) storage for program and data files, to index server 106 in response to requests received from 
and may include a hard disk drive, a floppy disk drive along 30 index server 106. In alternative embodiments, web servers 
with associated removable media, a Compact Digital Read 104 may be configured to communicate indices information 
Only Memory (CD-ROM) drive, an optical drive, removable to index server 106 on a periodic basis. The length of the 
media cartridges, and other like storage media. One or more period may be user configurable. Indices information may 
of the drives may be located at remote locations on other be communicated to index server 106 at regular time inter- 
connected computers at another site coupled to communi- 35 vals to ensure that index server 106 has the latest snapshot 
cation network 108. Information stored according to the of documents stored by the web servers. Various other 
teachings of the present invention may also be stored by file techniques may also be used to ensure that index server 106 
storage subsystem 220. has up-to-date information. 

Bus subsystem 202 provides a mechanism for letting the 4Q According to an embodiment of the present invention, 

various components and subsystems of computer system index server 106 may process the indices information 

200 communicate with each other as intended. The various received from the web servers before communicating the 

subsystems and components of computer system 200 need information to user system 102. As part of the processing, 

not be at the same physical location but may be distributed index server 106 may combine information contained in the 

at various locations within distributed network 100. 45 various indices received from the various web servers to 

Although bus subsystem 202 is shown schematically as a form the index information to be communicated to user 

single bus, alternative embodiments of the bus subsystem system 102. 

may utilize multiple busses. In ot her embodiments of the present invention, index 

Computer system 200 itself can be of varying types server 106 may itself generate or configure an index for 

including a personal computer, a portable computer, a 50 documents stored by computer systems coupled to commu- 

workstation, a computer terminal, a network computer, a nication network 108. For example, index server 106 may 

mainframe, a kiosk, a personal data assistant (PDA), a use spiders, crawlers, etc. to collect information related to 

communication device such as a cell phone, or any other web pages accessible via communication network 108. 

data processing system. Due to the ever-changing nature of Index server 106 may then build an index based on the 

computers and networks, the description of computer system 55 documents related information collected by the spiders, 

200 depicted in FIG. 2 is intended only as a specific example crawlers, etc. 

for purposes of illustrating the preferred embodiment of the since the size of the index information can be quite large, 

computer system. Many other configurations of a computer according to an embodiment of the present invention, index 

system are possible having more or fewer components than serve,. iQ6 performs processing to reduce the size of the 

the computer system depicted in FIG. 2. Computer system 60 index information to be communicated to user system 102. 

200 may function as a client or a server, or combinations As part of this processing, index server 106 may use various 

thereof. filters reduce the size of the index information to be com- 

FTG. 3 is a simplified high-level flowchart 300 depicting municated to user system 102. Index server 106 may also 

processing performed according to an embodiment of the use various data compression techniques to reduce the size 

present invention to identify and access documents (e.g., 65 of the index information communicated to user system 102. 

web pages in a Web environment) of interest to a user According to an embodiment of the present invention, index 

without compromising the user's privacy. Although FIG. 3 server 106 may be configured to communicate index infor- 
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mation to user system 102 in an incremental manner, such 
that only new or changed index information (a delta from the 
previously communicated index information) is communi- 
cated to user system 102. 

According to an embodiment of the present invention, 
index server 106 may be configured to communicate only a 
portion (referred to as a "partial index information 1 ') of the 
index information received by or configured by index server 
106 to user system 102. Index server 106 may determine the 
contents of the partial index information to be communi- 
cated to user system 102 based on criteria provided to index 
server 106. According to an embodiment of the present 
invention, the criteria is user-configurable. The user config- 
ured criteria is generally abstract and generic enough that 
user privacy is not compromised. For example, a particular 
user may only be interested in documents related to Sports, 
and accordingly may indicate to index server 106 that the 
user system used by the user should only receive index 
information related to "Sports" related documents. In this 
scenario, index server 106 may extract sports-related infor- 
mation from the index information received by or configured 
by index server 106, and communicate only the sports- 
related index information to user system 102 in step 304. 

Index server 106 then communicates the index informa- 
tion configured in step 302 to user system 102 (step 304). 
The index information comprises information about web 
pages accessible via communication network 108. The index 
information may include information identifying the web 
pages, information identifying the location of the web pages 
(e.g. URLs corresponding to the web pages, information 
about servers storing the web pages, etc.), information 
related to the contents of the web pages (e.g. concepts/ 
topics/subjects discussed by the web pages), and other 
information related to the web pages. The index information 
is generally organized in a manner which facilitates identi- 
fication of web pages and/or locations of the web pages 
based on criteria related to the web pages and/or their 
contents. The criteria may include words occurring in the 
web pages, concepts or topics discussed by the web pages, 
contents of the web pages, servers storing the web pages, 
and other attributes of the web pages. 

According to an embodiment of the present invention, 
index server 106 may communicate the index information to 
user system 102 on a periodic basis. The time period 
between transfers may be user configurable. Index informa- 
tion may be communicated to user system 102 at regular 
time intervals to ensure that the user system has the latest 
snapshot of the documents stored by the servers. Various 
other techniques may also be used to ensure that user system 
102 has up-to-date information about the web pages and 
documents stored by web servers 104. In an alternative 
embodiment, index server 106 may communicate the index 
information to user system 102 in response to information 
requests received from user system 102. For example, user 
system 102 may send a signal to index server 106 requesting 
the index server to download the index information to the 
requesting user system. 

In alternative embodiments of the present invention, user 
system 102 may also receive index information directly 
from web servers 104 which provide search engines (step 
306). Web servers 104 may communicate the indices infor- 
mation to user system 102 on a periodic basis. The time 
period between transfers may be user configurable. In alter- 
native embodiments, web servers 104 may communicate the 
index information to user system 102 in response to requests 
received from user system 102. 

FIG. 4 depicts a portion of index information that may be 
communicated from index server 106 (or from web servers 
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104 which provide search engines) to user system 102 
according to an embodiment of the present invention. The 
index information depicted in FIG. 4 comprises information 
related to a plurality of documents stored by the CNNSI 

5 website. For each document, the index information indicates 
an URL 402 which can be used to access the document, a list 
of topics, subjects, concepts 404 to which the document 
relates, and a title 406 of the document. The topic/subject/ 
concept information for a document comprises information 

10 related to the contents of the document. For example, URL 
"sportsillustrated.cnn.com/baseball/mlb/news/2001/05/07/ 

cordova indians_ap" can be used to access a document 

titled "Cordova flourishing in potent Cleveland lineup" 
which contains information related to concepts/topics 

15 "Cleveland Indians" and "Major League Baseball". User 
system 102 can process this index information to find 
documents that might be of interest to the user. It should be 
apparent that the index information received by user system 
102 may be in various other formats and may contain more 

20 or less information than the portion of the index information 
depicted in FIG. 4. 

Referring back to FIG. 3, at step 308, user system 102 
receives index information communicated either from index 
server 106 or from one or more web servers 104 (step 308). 

25 The index information received by user system 102 is then 
used to identify web pages of interest to the user on the user 
system (step 310). Various different techniques may be used 
to identify web pages of interest to or relevant to the user 
using the index information received in step 308. According 

30 to the teachings of the present invention, for each of the 
identification techniques, the entire processing for identify- 
ing web pages of interest to the user is performed on user 
system 102 itself, substantially free from any interaction 
with web servers 104 and index server 106. Unlike conven- 

35 tional search engines, the user does not have to provide 
search queries or other information to remote search engine 
servers where the information may be mined and/or tracked. 
Since all identification operations are performed locally on 
user system 102, the user has complete control over the type 

40 of information which can be tracked during the identification 
process, and also has complete control over the distribution 
of the information. The user's privacy is thus preserved as 
user-related information cannot be tracked/mined or distrib- 
uted from user system 102 without the user's authorization. 

45 Since relevant web pages are identified based on index 
information which is locally stored on user system 102, user 
system 102 does not have to be connected to communication 
network 108 during the identification process, i.e. the index 
information received by user system 102 can be searched 

50 and web pages of interest to the user can be identified in an 
offline manner This is substantially different from conven- 
tional web page identification techniques using search 
engines executing on remote servers which require that the 
user system have a network connection to the server execut- 

55 ing the search engine in order for the search to be performed 
and relevant web pages identified. Local access to the index 
information also increases the speed of the identification 
process as compared to conventional network-based search 
techniques which are usually executed on remote servers. 

60 As indicated above, various different techniques may be 
used to identify web pages of interest to or relevant to the 
user using the index information received in step 308. 
According to an embodiment of the present invention, a 
localized search engine is provided to facilitate the identi- 

65 fication of relevant documents (web pages). The localized 
search engine executes on user system 102. The localized 
search engine is configured to accept a search query from the 
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user, and search the index information to identify URLs U.S. patent application Ser. No. 09/511,034, entitled "SYS- 

corresponding to web pages which satisfy or are relevant to TEM FOR CREATING USER PROFILES", and U.S. patent 

the search query. According to an embodiment of the present application Ser. No. 09/510,904, entitled "COMPUTER 

invention, the localized search engine may be coupled to a PROGRAM FOR CREATING USER PROFILES", (all 

browser program executing on user system 102 and receive 5 currently pending), the entire disclosures of which are herein 

the search query via the browser. URLs identified by the incorporated by reference for all purposes, 

localized search engine may be presented to the user using Accordi lo an embodiment of the present invention, the 

the browser interface. t j u j uju iu \i* 

„ . . ,. . . ., localized search engine described above may also be used to 

The user query may contain query terms which describe, ^ Morm b ation which be us / d for buildin , 

for example a topic or concept for which the user is 10 usef fl , e Jhe localized xalch \ D ^ ae ide a 

mterested m finding more information. For example, if the r - . ... , , t j U *u 

. . , , . %. ,. . c 4 . ™ . i . t , user-configurable option which when selected by the user 

user is interested in finding information on Thai cooking, the _ , , ,, n , , _ , * rt ^ 

. . . * j , ta , .„ . „ „ enables the localized search engine to mine the query 

query terms may include the words Thai and cooking. . f t . . , , , t , ° , 4 , . ,■ « {. 

7 J 4 • * i * j u . • information provided by the user to the localized search 

In response to the query, the localized search engine is . ™ . c . , . t1 _ , , , 

/» j * j .-2 fitit j * * l engine. The information mined by the locahzed search 

configured to identify URLs corresponding to web pages . «. u j * i_ -u , lL c , 

, & , . .* . * . j engme may then be used to build or augment the user profile 

related to Thai cooking. Various different comparison and .5 * j u * .u ■ ■ / 

... to , . information stored by user system 102. Since the mining/ 

search techniques may be used by the locahzed search . , . f • r • U i a -c _ a 

\ . . j . r . j . tracking of mformation is user-configurable and performed 

engine to search the index information to identify documents in ( 1M (L & . i u j 

, & t , „, , . j "I*/? ji_ »i_ locally on user system 102, the user can control when and 

relevant to the user query. The documents identified by the , ' r - / ■ ju*ui j u 

. . ■ L . * what type of mformation is mined by the localized search 

localized search engine may be presented to the user via an on . J ~ it _ . 4 , . , . r J . 4 „ 

, . T j . * 20 engme. Further, since the mined mformation is stored locally 

output device coupled to user system 102. °. ' tL , . i . . 

f / , . . on the user system, the user has complete control over the 

According to the teachings of the present invention, the distribution/communication of the tracked information. The 

localized search engine executes locally on user system 102 d of ^ mformation is thus not compromised, 

and uses index information received from index server 106 . „ , . , , 

or from web servers 104 and which is stored locally by user 25 Vanou f techniques may also be used to identify web 

system 102. THus, unlike conventional search techniques, F a f 5 an ? other documents of interest to the user based upon 

the user query is not communicated to a remote search ^ ndex information received from index server 106 and/or 

engine server where the information contained in the query trom web 104 - 

may be mined or tracked by the server. Since all operations Referring back to FIG. 3, the user may then select and 

are performed on user system 102, the user has complete 30 access one or more web pages or documents from the web 

control on the type of information, if any, which is tracked/ pages/documents identified in step 310 (step 312). For 

stored by the localized search engine. The user also has example, the user may select a URL identified in step 310 

complete control over the distribution of information, if any, and access the web page corresponding to the URL from a 

tracked by the localized search engine. The user's privacy is web server storing the corresponding web page. In this 

thus preserved as user-related information cannot be 35 manner, the present invention enables the user to identify 

tracked/mined or distributed from user system 102 without and access web P a g es otner documents of interest to the 

the user's authorization. user - 

According to another embodiment of the present In a specific embodiment, the present invention also 
invention, documents (e.g. web pages) relevant to the user provides techniques for ensuring that the web server from 
may be automatically identified based upon the index infor- 40 which a web page is accessed in step 312 does not track 
mation received from index server 106 or from web servers information from the user system. For example, web pages 
104 and based upon user-related information which may be accessed by the user in step 312 may have "cookies" 
stored (e.g. a user profile) by user system 102. The user associated with them. A "cookie" is a mechanism that allows 
profile may include information about a user's likes/dislikes, the web server storing the web page to collect information 
preferences, commonly visited web pages, concepts or top- 45 about a user who accesses the web page. A cookie is usually 
ics of interest to the user, and other information related to the transmitted to user system 102 along with a web page 
user or the user's browsing behavior. An "automated docu- accessed by the user and is configured to collect information 
ment selector" module may be provided which executes on about the user, e.g. information related to the user's inter- 
user system 102 and identifies documents or web pages action with the accessed web page(s). The information 
(identified by URLs) of potential interest to the user based 50 collected by a cookie may be stored on user computer 102 
upon user information accessible via user system 102 (e.g., or may be transmitted back to the web server. Since cookies 
user information stored in a user profile file). The URLs generally collect information without the user's knowledge 
corresponding to web pages identified to be of potential or authorization, the collection and dissemination of that 
interest to the user may be presented to the user via an output information constitutes a breach of the user's privacy, 
device coupled to user system 102. Like with the other 55 In order to prevent such breaches of privacy, according to 
document identification techniques, according to the present an embodiment of the present invention, an "audit module" 
invention, the processing performed to automatically iden- is provided which executes on user system 102 and ensures 
tify relevant web pages is performed locally on user system that information about the user is not tracked or monitored 
102. The user thus has complete control over the type of on user system 102, or communicated to external computer 
information which is stored by the identification techniques 60 systems, including web servers 104 and index server 106 by 
and over the dissemination of the information. This in turn using techniques such as cookies, without the user's autho- 
preserves the privacy and security of the user. rization. The audit module includes a "sniffer" that monitors 

The user profile information may have been collected the presence of cookies and also monitors information being 

using various techniques. Examples of techniques for col- transmitted from user computer 102 to external entities. If 

lecting user information and generating user profiles are 65 the audit module detects that information related to the user 

described in U.S. patent application Ser. No. 09/510,902 is being tracked or communicated without the user's 

entitled "METHOD FOR CREATING USER PROFILES", permission, the user is notified of such a condition. 



11/04/2003, EAST version: 1.4.1 



US 6,581,072 Bl 

15 16 

Examples of violating conditions may include the presence contains information related to documents hosted by com- 

of a cookie associated with a web page, communication of puter systems in a network environment. As described 

user information from the user system to a web server or to above, index information 512 is received from index server 

the index server, tracking/mining of user information with- 106 or from web servers 104 which provide search engines, 

out the user's authorization, etc. The user may be notified 5 Use r profile information 804 comprises information related 

about the violating condition via an audio alarm, a flashing t0 me user > s preferences, likes/dislikes, etc. The information 

graphical user indication (e.g., a flashing icon), a streaming ^ generally stored in the storage subsystem of user system 
audio/video message, an email, audio output, video output, 

combinations of the aforementioned techniques and the Acc {o an embodiment of lhe , inveQti 

like. The audit module may then allow the user to remove 10 communication module 502 is configured to facilitate com- 

the v.olatwg condition. For example, the audit module may munication of information and data t0 aild from ^ system 

allow the user to delete cookies associated with web pages in ~ ^ . , , iaft - „ „ - T „ „„u nnna 

, . . -i £w ... ,i 102. Communication module 1302 may receive web page 

accessed by the user, or provide a filter which automatically . f • , , r c „ ctam mi \!i 

% . • , • « . j . requests from the various modules of user system 102 and 

deletes cookies associated with any web pace accessed by r , ... T » i_ 

" t l L j-. j | F & , . Vu forward the requests to web servers. In response to the web 

the user. In this manner, the audit module enables the user 15 communication module 502 may receive web 

to preserve the privacy of user-related information In a ^ ^ {h& ^ b ^ ^ web 

specific configuration the audit module may be pro- £ 0 * mmunicatioD module 502 may forward the web pages to 

grammed to automatically prevent the monitoring and com- 5Q4 tQ ^ qu {q ^ ^ Iq c 

munication of any user related information by an external _ . „ t - ~ A . KM 

ter entit from user s stem 102 ments of the present invention, communication module 502 

computer entity om user sys em 20 ma y also receive requests (not shown in FIG. 5) to download 

In alternative embodiments of the present invention, an ^ information from index server 106 . Communication 

audit server coupled to one or more user systems 102 may module 502 comm unicate these requests to index server 

be provided to perform the tasks performed by an audit jq^ 

module. In this embodiment, the audit server sniffer ensures * . B , _ , . . , . _ 

n f • trt # . „ . Communication module 502 also receive mdex lnforma- 

tnat the information going back to the web server rrom any . , . , r . 

user system is the same, regardless of the user system, i.e., * 0D either from mde * *™ 106 or ^m webservers 104. 

each user system will send back an acknowledgment packet, Communication module 502 may store the index informa- 

which indicates that the index information has been f in . the , stora S e *f>*V*}*™ of user system 102 or may 

received. Each user system may also send back its IP * Amatively forward the information to appropriate mod - 

address. In preferred embodiments, there is substantially no 30 ules of user system 102. Communication of other informa- 

other information that is sent from a user system 102 to the tlon t0 and from ^% S £ m 102 may als ° be handled by 

web server. Any other information may be tagged as a communication module 502. 

violating condition. The sniffer monitors the user system As described above, browser program 504 enables a user 

activity to make sure that information from the user system t0 retrieve, and access web pages. Examples of 

is not being communicated to the index server. The audit 35 browsers include the Internet Explorer browser program 

server is used only to verify that the privacy of the user is provided by Microsoft Corporation, and the Netscape Navi- 

being maintained and is not an essential part of the system g ator browser provided by Netscape Corporation, and oth- 

described in this invention. ers - Auser generally accesses a web page by providing URL 

FIG. 5 depicts information stored by user system 102 and information corresponding to the web page to browser 504, 

modules which are executed by user system 102 to provide 40 either directl y or indirectl y- Browser 504 communicates the 

features according to an embodiment of the present inven- URL web page request to communication module 502 which 

tion. Although FIG. 5 and the associated description communicates the request to a web server storing the 

describes modules for identifying and accessing web pages requested web page. Hie requested web page corresponding 

of interest to the user, it should be apparent that the modules t0 the user-provided URL received by user system 102 from 

can also be used to identify and access other types of 45 the web server can then be displayed to the requesting user 

documents such as documents created using application v ^ a Drowser 504. 

programs such as word processors, communication According to an embodiment of the present invention, the 

programs, etc. The modules and information depicted in user may also provide search requests to localized search 

FIG. 5 are merely illustrative of an embodiment incorporat- engine 506 via browser 504. Browser 504 may be configured 

ing the present invention and do not limit the scope of the 50 to receive URLs from localized search engine 506 in 

invention as recited in the claims. One of ordinary skill in the response to the search requests, and to output the URLs to 

art would recognize other variations, modifications, and the user. The user may then select a URL from the list of 

alternatives. For example, in alternative embodiments of the URLs provided by localized search engine 506 and access a 

present invention, one or more of the modules depicted in web page corresponding to the selected URL via browser 

FIG. 5 may be combined into a single module, and/or a 55 504. 

single module depicted in FIG. 5 may be broken down into According to an embodiment of the present invention, 

several modules. The modules depicted in FIG. 5 may be localized search engine 506 is configured to identify web 

implemented in software or hardware, or combinations pages of interest to the user based upon index information 

thereof. The software modules are executed by the processor 512 received by user system 102 and based upon search 

of user system 102. 60 queries received from the user describing topics or concepts 

According to an embodiment of the present invention, the of interest to the user and for which the user is interested in 

modules depicted in FIG. 5 include a browser program finding relevant web pages. As shown in FIG. 5, localized 

module 504, a localized search engine module 506, an search engine 506 may receive the user search queries via 

automated document selector module 508, an audit module browser 504. In alternative embodiments of the present 

510, and a communication module 502. The information 65 invention, localized search engine may receive search que- 

stored by user system 102 may include index information ries directly from the user, e.g., via an user interface pro- 

512 and user profile information 514. Index information 512 vided by the localized search engine. Upon receiving a user 
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query, localized search engine 506 searches index informa- 
tion 512 to identify relevant web pages (identified by URLs) 
which satisfy the search query. The list of URLs correspond- 
ing to the relevant web pages is provided to the user via 
browser 504. As described above, the user may then select 5 
one or more URLs from the list of URLs identified by 
localized search engine 506 and access web pages corre- 
sponding to the selected URLs. 

As previously described, according to an embodiment of 
the present invention, localized search engine 506 may also 1Q 
track and/or mine information contained in the search que- 
ries provided by the user to the localized search engine. The 
option to mine user query information is user-configurable. 
The information mined by localized search engine 506 may 
be used to build or augment user profile information 514 
which may be stored in the storage subsystem of user system 35 
102. 

According to an embodiment of the present invention, 
automated document selector module 508 is configured to 
automatically identify web pages of interest to the user 
based upon user profile information 514 and index inform a- 20 
tion 512. Web pages (identified by URLs) identified by 
automated document selector module 508 to be of interest to 
the user may be output to the user via browser 504. The user 
may then select one or more URLs from the list of URLs 
identified by automated document selector module 508 and 25 
access web pages corresponding to the selected URLs. 

According to an embodiment of the present invention, 
audit module 510 prevents unauthorized monitoring and 
communication of information related to the user or to the 
user's browsing activities from user system 102. For 30 
example, audit module 510, in cooperation with browser 
504, tracks the presence of cookies associated with web 
pages accessed by the user via browser 504. If audit module 
510 detects a cookie, the user is notified of the cookie and 
provided an option to delete the cookie. Audit module 510, 35 
in cooperation with communication module 502, also moni- 
tors information communicated to and from user system 102 
to prevent communication of user-related information to 
index server 106 or to web servers 104 without the user's 
permission. 40 

As described above, the present invention enables a user 
to identify and access documents stored in a network envi- 
ronment without compromising user privacy. Instead of 
using remote search engine servers which can mine user 
information to identify documents of interest to the user, the 45 
present invention identifies the relevant documents based 
upon index information received by user system 102. Since 
the process of identifying relevant documents is performed 
locally on user system 102 using index information locally 
stored on user system 102, remote servers such as index so 
server 106 and web server 104 do not have to receive any 
information (e.g. search queries, etc.) related to the user 
which can be mined to build user profiles. Accordingly, the 
identification of documents of interest to the user is per- 
formed by the user system substantially free from any 55 
interactions with other computer systems coupled to com- 
munication network 108. The local search also obviates the 
need to use remote search engines which can track both 
query information and information about web pages 
accessed by the user using the remote search engine. The 60 
present invention also ensures that user- related information 
is not tracked by web servers via mechanisms such as 
cookies. The user's privacy is thus preserved as user-related 
information cannot be tracked/mined or accessed by any 
computer system remote from the user system 102 without 65 
the user's authorization. User privacy and security is con- 
sequently preserved. 
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Although specific embodiments of the invention have 
been described, various modifications, alterations, alterna- 
tive constructions, and equivalents are also encompassed 
within the scope of the invention. The described invention is 
not restricted to operation within certain specific data pro- 
cessing environments, but is free to operate within a plural- 
ity of data processing environments. Additionally, although 
the present invention has been described using a particular 
series of transactions and steps, it should be apparent to 
those skilled in the art that the scope of the present invention 
is not limited to the described series of transactions and 
steps. 

Further, while the present invention has been described 
using a particular combination of hardware and software, it 
should be recognized that other combinations of hardware 
and software are also within the scope of the present 
invention. The present invention may be implemented only 
in hardware or only in software or using combinations 
thereof. 

The specification and drawings are, accordingly, to be 
regarded in an illustrative rather than a restrictive sense. It 
will, however, be evident that additions, subtractions, 
deletions, and other modifications and changes may be made 
thereunto without departing from the broader spirit and 
scope of the invention as set forth in the claims. 
What is claimed is: 

1. In a network environment comprising a user system, an 
index server, and a plurality of web servers storing a 
plurality of documents, a method of accessing a first docu- 
ment from the plurality of documents using the user system, 
the method comprising: 

determining, at the index server, index information to be 
communicated to the user system, the index informa- 
tion comprising information identifying the plurality of 
documents stored by the plurality of web servers and 
information related to the contents of the plurality of 
documents; 

communicating the index information from the index 

server to the user system; and 
at the user system: 
identifying a first set of documents from the plurality of 
documents using the index information received 
from the index server, the first set of documents 
including the first document, wherein the first set of 
documents is identified substantially free from inter- 
action with the index server and the plurality of web 
servers; 

receiving a signal indicating selection of the first docu- 
ment from the first set of documents; and 
responsive to the signal, accessing the selected first 
document from a web server storing the first docu- 
ment. 

2. The method of claim 1 wherein determining, at the 
index server, index information to be communicated to the 
user system comprises: 

at the index server: 
for each document in the plurality of documents: 
determining information identifying the document; 
determining information identifying a web server 

storing the document; and 
determining information related to the contents of 
the document; and 
generating the index information based upon the infor- 
mation identifying the plurality of documents, the 
information identifying web servers storing the plu- 
rality of documents, and the information related to 
the contents of the plurality of documents. 
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3. The method of claim 2 wherein generating the index 
information comprises: 

at the index server: 

accessing a first set of criteria; 

from the information identifying the plurality of 5 
documents, the information identifying web servers 
storing the plurality of documents, and the informa- 
tion related to the contents of the plurality of 
documents, determining information which satisfies 
the first set of criteria; and 10 

generating the index information based upon the infor- 
mation which satisfies the first set of criteria. 

4. The method of claim 1 wherein determining, at the 
index server, index information to be communicated to the 
user system comprises: 

at the index server: 

receiving first index information from a first server 
providing a first search engine, the first index infor- 
mation comprising information related to documents 
from the plurality of documents which can be iden- 
tified using the first search engine; and 20 

configuring the index information to be communicated 
to the user system based upon the first index infor- 
mation. 

5. The method of claim 1 wherein determining, at the 
index server, index information to be communicated to the 25 
user system comprises: 

at the index server: 

receiving first index information from a first server 
providing a first search engine, the first index infor- 
mation comprising information related to documents 30 
from the plurality of documents which can be iden- 
tified using the first search engine; 

receiving second index information from a second 
server providing a second search engine, the second 
index information comprising information related to 35 
documents from the plurality of documents which 
can be identified using the second search engine; and 

configuring the index information to be communicated 
to the user system based upon the first index infor- 
mation and the second index information. 40 

6. The method of claim 5 wherein configuring the index 
information to be communicated to the user system com- 
prises combining the first index information and the second 
index information to generate the index information. 

7. The method of claim 1 wherein identifying, at the user 45 
system, the first set of documents from the plurality of 
documents using the index information received from the 
index server comprises: 

at the user system: 

receiving a search query; 50 
responsive to receiving the search query, searching the 
index information to identify documents from the 
plurality of documents which satisfy the search 
query; and 

including the documents which satisfy the search query 55 
in the first set of documents. 

8. The method of claim 1 wherein identifying, at the user 
system, the first set of documents from the plurality of 
documents using the index information received from the 
index server comprises: 60 

at the user system: 

accessing information related to a user of the user 
system; 

searching the index information to identify documents 
from the plurality of documents based upon the 65 
information related to the user of the user system; 
and 



including the documents identified based upon the 
information related to the user of the user system in 
the first set of documents. 

9. The method of claim 1 wherein: 

the plurality of documents stored by the plurality of web 
servers are a plurality of web pages, and the first set of 
documents includes a first set of web pages from the 
plurality of web pages; 

identifying, at the user system, the first set of documents 
from the plurality of documents using the index infor- 
mation received from the index server comprises iden- 
tifying a first set URLs corresponding to the first set of 
web pages; 

receiving a signal indicating selection of the first docu- 
ment comprises receiving a signal indicating selection 
of a first URL from the first set of URLs; and 

accessing the selected first document comprises accessing 
a web page corresponding to the selected first URL. 

10. The method of claim 1 wherein accessing the selected 
first document comprises: 

determining if the web server storing the first document is 
tracking information from the user system; and 

if the web server storing the first document is tracking 
information from the user system, preventing the web 
server from tracking the information from the user 
system. 

11. The method of claim 10 wherein determining if the 
web server storing the first document is tracking information 
from the user system comprises: 

determining if a cookie is associated with the first docu- 
ment accessed using the user system. 

12. A method of accessing a first document from the 
plurality of documents stored by a plurality of servers using 
a user system, the method comprising: 

at the user system: 

receiving index information from an index server, the 
index information comprising information identify- 
ing the plurality of documents stored by the plurality 
of servers and information related to the contents of 
the plurality of documents; 

identifying a first set of documents from the plurality of 
documents using the index information received 
from the index server, the first set of documents 
including the first document, wherein the first set of 
documents is identified substantially free from inter- 
action with the index server and the plurality of 
servers; 

receiving a signal indicating selection of the first docu- 
ment from the first set of documents; and 

responsive to the signal, accessing the selected first 
document from a server storing the first document. 

13. The method of claim 12 wherein the index informa- 
tion received by the user system comprises information for 
the plurality of documents collected by the index server, the 
information collected by the index server comprising infor- 
mation identifying the plurality of documents, information 
identifying servers storing the plurality of documents, and 
information related to the contents of the plurality of docu- 
ments. 

14. The method of claim 12 wherein receiving index 
information from the index server comprises: 

communicating a first set of criteria from the user system 
to the index server; and 

wherein the index information received by the user sys- 
tem from the index server comprises information sat- 
isfying the first set of criteria. 
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15. The method of claim 12 wherein the index informa- 
tion received by the user system comprises first index 
information communicated by a first server providing a first 
search engine to the index server, and second index infor- 
mation communicated by a second server providing a sec- 5 
ond search engine to the index server, 

wherein the first index information comprises information 
identifying documents from the plurality of documents 
which can be identified using the first search engine; 
and 10 

wherein the second index information comprises infor- 
mation identifying documents from the plurality of 
documents which can be identified using the second 
search engine. 

16. The method of claim 12 wherein identifying the first 35 
set of documents from the plurality of documents using the 
index information received from the index server comprises: 

at the user system: 

receiving a search query; 

responsive to receiving the search query, searching the 20 
index information to identify documents from the 
plurality of documents which satisfy the search 
query; and 

including the documents which satisfy the search query 
in the first set of documents. 25 

17. Trie method of claim 12 wherein identifying the first 
set of documents from the plurality of documents using the 
index information received from the index server comprises: 

at the user system: 

accessing information related to a user of the user 30 
system; 

searching the index information to identify documents 
from the plurality of documents based upon the 
information related to the user of the user system; 
and 35 

including the documents identified based upon the 
information related to the user of the user system in 
the first set of documents. 

18. The method of claim 12 wherein: 

the plurality of documents stored by the plurality of 40 
servers are a plurality of web pages, and the first set of 
documents includes a first set of web pages from the 
plurality of web pages; 

identifying the first set of documents from the plurality of ^ 
documents using the index information received from 
the index server comprises identifying a first set URLs 
corresponding to the first set of web pages; 

receiving a signal indicating selection of the first docu- 
ment comprises receiving a signal indicating selection 
of a first URL from the first set of URLs; and 

accessing the selected first document comprises accessing 
a web page corresponding to the selected first URL. 

19. The method of claim 12 wherein accessing the 
selected first document comprises: 55 

determining if the server storing the first document is 
tracking information from the user system; and 

if the server storing the first document is tracking infor- 
mation from the user system, preventing the server 
from tracking information from the user system. 60 

20. The method of claim 19 wherein determining if the 
server storing the first document is tracking information 
from the user system comprises: 

determining if a cookie is associated with the first docu- 
ment accessed using the user system. 65 

21. A computer program product stored on a computer- 
readable storage medium for accessing a first document 



50 



using a user system from a plurality of documents stored in 
a network environment, the computer program product 
comprising: 

code for receiving index information from an index 
server, the index information comprising information 
identifying the plurality of documents, information 
related to contents of the plurality of documents, and 
information identifying servers storing the plurality of 
documents; 

code for identifying a first set of documents from the 
plurality of documents using the index information 
received from the index server, the first set of docu- 
ments including the first document, wherein the first set 
of documents is identified substantially free from inter- 
action with the index server and the servers storing the 
plurality of documents; 

code for receiving a signal indicating selection of the first 
document from the first set of documents; and 

responsive to the signal, code for accessing the selected 
first document from a server storing the first document. 

22. The computer program product of claim 21 wherein 
the code for receiving index information from the index 
server comprises: 

code for communicating a first set of criteria from the user 
system to the index server; and 

wherein the index information received by the user sys- 
tem from the index server comprises information sat- 
isfying the first set of criteria. 

23. The computer program product of claim 21 wherein 
the code for identifying the first set of documents from the 
plurality of documents using the index information received 
from the index server comprises: 

code for receiving a search query; 

responsive to receiving the search query, code for search- 
ing the index information to identify documents from 
the plurality of documents which satisfy the search 
query; and 

code for including the documents which satisfy the search 
query in the first set of documents. 

24. The computer program product of claim 21 wherein 
the code for identifying the first set of documents from the 
plurality of documents using the index information received 
from the index server comprises: 

code for accessing information related to a user of the user 
system; 

code for searching the index information to identify 
documents from the plurality of documents based upon 
the information related to the user of the user system; 
and 

code for including the documents identified based upon 
the information related to the user of the user system in 
the first set of documents. 

25. The computer program product of claim 21 the code 
for accessing the selected first document comprises: 

code for determining if the server storing the first docu- 
ment is tracking information from the user system; and 

if the server storing the first document is tracking infor- 
mation from the user system, code for preventing the 
server from tracking information from the user system. 

26. A system for accessing information comprising: 
a communication network; 

a plurality of web server systems coupled to the commu- 
nication network, the plurality of web server systems 
configured to store a plurality of documents, the plu- 
rality of documents including a first document; 
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an index server system coupled to the communication 

network; and 
a user system; 

wherein the index server system is configured to deter- 
mine index information to be communicated to the user 5 
system, the index information comprising information 
identifying the plurality of documents stored by the 
plurality of web server systems and information related 
to the contents of the plurality of documents; 

10 

wherein the index server system is configured to commu- 
nicate the index information to the user system; and 
wherein the user system is configured to: 

identify a first set of documents from the plurality of 
documents using the index information received 35 
from the index server system, the first set of docu- 
ments including the first document, the identification 
of the first set of documents being performed sub- 
stantially free from interaction with the index server 
system and the plurality of web server systems; 2 o 
receive a signal indicating selection of the first docu- 
ment from the first set of documents; and 
in response to the signal, access the selected first 
document from a web server storing the first docu- 
ment. 25 

27. The system of claim 26 wherein to determine the 
index information to be communicated to the user system, 
the index server system is configured to: 

for each document in the plurality of documents: 
determine information identifying the document; 30 
determine information identifying a web server storing 

the document; and 
determine information related to the contents of the 

document; and 

configure the index information based upon the informa- 35 
tion identifying the plurality of documents, the infor- 
mation identifying web servers storing the plurality of 
documents, and the information related to the contents 
of the plurality of documents. 

28. The system of claim 27 wherein to configure the index 40 
information, the index server system is configured to: 

access a first set of criteria; 

from the information identifying the plurality of 
documents, the information identifying web servers 
storing the plurality of documents, and the information 45 
related to the contents of the plurality of documents, 
determine information which satisfies the first set of 
criteria; and 

generate the index information based upon the informa- 5Q 
tion which satisfies the first set of criteria. 

29. The system of claim 26 wherein to determine the 
index information to be communicated to the user system, 
the index server system is configured to: 

receive first index information from a first server provid- 55 
ing a first search engine, the first index information 
comprising information related to documents from the 
plurality of documents which can be identified using 
the first search engine; and 

configure the index information to be communicated to 60 
the user system based upon the first index information. 

30. The system of claim 26 wherein to determine the 
index information to be communicated to the user system, 
the index server system is configured to: 

receive first index information from a first server provid- 65 
ing a first search engine, the first index information 
comprising information related to documents from the 
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plurality of documents which can be identified using 
the first search engine; 

receive second index information from a second server 
providing a second search engine, the second index 
information comprising information related to docu- 
ments from the plurality of documents which can be 
identified using the second search engine; and 

configure the index information to be communicated to 
the user system based upon the first index information 
and the second index information. 

31. The system of claim 30 wherein to configure the index 
information to be communicated to the user system, the 
index server system is configured to combine the first index 
information and the second index information to generate 
the index information. 

32. The system of claim 26 wherein to identify the first set 
of documents from the plurality of documents using the 
index information received from the index server system, 
the user system is configured to: 

receive a search query; 

responsive to receiving the search query, search the index 
information to identify documents from the plurality of 
documents which satisfy the search query; and 

include the documents which satisfy the search query in 
the first set of documents. 

33. The system of claim 26 wherein to identify the first set 
of documents from the plurality of documents using the 
index information received from the index server system, 
the user system is configured to: 

access information related to a user of the user system; 

search the index information to identify documents from 
the plurality of documents based upon the information 
related to the user of the user system; and 

include the documents identified based upon the informa- 
tion related to the user of the user system in the first set 
of documents. 

34. The system of claim 26 wherein: 

the plurality of documents stored by the plurality of web 
server systems are a plurality of web pages, and the first 
set of documents includes a first set of web pages from 
the plurality of web pages; 

to identify the first set of documents from the plurality of 
documents using the index information received from 
the index server system, the user system is configured 
to identify a first set URLs corresponding to the first set 
of web pages; 

to receive a signal indicating selection of the first 
document, the user system is configured to receive a 
signal indicating selection of a first URL from the first 
set of URLs; and 

to access the selected first document, the user system is 
configured to access a web page corresponding to the 
selected first URL. 

35. The system of claim 26 wherein to access the selected 
first document, the user system is configured to: 

determine if the web server storing the first document is 
tracking information from the user system; and 

if the web server storing the first document is tracking 
information from the user system, prevent the web 
server from tracking information from the user system. 

36. The system of claim 35 wherein to determine if the 
web server storing the first document is tracking information 
from the user system, the user system is configured to 
determine if a cookie is associated with the first document 
accessed from the web server. 
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37. A data processing system for accessing a first docu- 
ment from a plurality of documents stored by a plurality of 
servers, the data processing system comprising: 

a processor; 

a memory coupled to the processor, the memory config- 
ured to store a plurality of code modules for execution 
by the processor, the plurality of code modules com- 
prising: 

code for receiving index information from an index 
server, the index information comprising informa- 
tion identifying the plurality of documents stored by 
the plurality of servers and information related to the 
contents of the plurality of documents; 

code for identifying a first set of documents from the 
plurality of documents using the index information 
received from the index server, the first set of docu- 
ments including the first document, wherein the first 
set of documents is identified substantially free from 
interaction with the index server and the plurality of 
servers; 

code for receiving a signal indicating selection of the 
first document from the first set of documents; and 

responsive to the signal, code for accessing the selected 
first document from a server storing the first docu- 
ment. 

38. The system of claim 37 wherein the index information 
received by the data processing system comprises informa- 
tion for the plurality of documents collected by the index 
server, the information collected by the index server com- 
prising information identifying the plurality of documents, 
information identifying servers storing the plurality of 
documents, and information related to the contents of the 
plurality of documents, 

39. The system of claim 37 wherein the code for receiving 
index information from the index server comprises: 

code for communicating a first set of criteria from the data 
processing system to the index server; 

wherein the index information received by the data pro- 
cessing system from the index server comprises infor- 
mation satisfying the first set of criteria. 

40. The system of claim 37 wherein the index information 
received by the data processing system comprises first index 
information communicated by a first server providing a first 
search engine to the index server, and second index infor- 
mation communicated by a second server providing a sec- 
ond search engine to the index server, 

wherein the first index information comprises information 
identifying documents from the plurality of documents 
which can be identified using the first search engine; 
and 

wherein the second index information comprises infor- 
mation identifying documents from the plurality of 
documents which can be identified using the second 
search engine. 

41. The system of claim 37 wherein the code for identi- 
fying the first set of documents from the plurality of docu- 
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ments using the index information received from the index 
server comprises: 

code for receiving a search query; 
responsive to receiving the search query, code for search- 
ing the index information to identify documents from 
the plurality of documents which satisfy the search 
query; and 

code for including the documents which satisfy the search 
query in the first set of documents. 

42. The system of claim 37 wherein the code for identi- 
fying the first set of documents from the plurality of docu- 
ments using the index information received from the index 
server comprises: 

code for accessing information related to a user of the data 
processing system; 

code for searching the index information to identify 
documents from the plurality of documents based upon 
the information related to the user of the data process- 
ing system; and 

code for including the documents identified based upon 
the information related to the user of the data process- 
ing system in the first set of documents. 

43. The system of claim 37 wherein: 

the plurality of documents stored by the plurality of 
servers are a plurality of web pages, and the first set of 
documents includes a first set of web pages from the 
plurality of web pages; 

the code for identifying the first set of documents from the 
plurality of documents using the index information 
received from the index server comprises code for 
identifying a first set URLs corresponding to the first 
set of web pages; 

the code for receiving a signal indicating selection of the 
first document comprises code for receiving a signal 
indicating selection of a first URL from the first set of 
URLs; and 

the code for accessing the selected first document com- 
prises code for accessing a web page corresponding to 
the selected first URL. 

44. The system of claim 37 wherein the code for accessing 
the selected first document comprises: 

code for determining if the server storing the first docu- 
ment is tracking information from the data processing 
system; and 

if the server storing the first document is tracking infor- 
mation from the data processing system, code for 
preventing the server from tracking information from 
the data processing system. 

45. The system of claim 44 wherein the code for deter- 
mining if the server storing the first document is tracking 
information from the data processing system comprises code 
for determining if a cookie is associated with the first 
document accessed using the data processing system. 
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